Skip to content

Conversation

@trevor-wilson-polygonio
Copy link
Collaborator

Cherry pick apache#19918

)

- Closes apache#19917.

Reduce the number of rows retrieved by pushing down more filters when
possible. Example:

```sql
create table t1 (k int, v int);
create table t2 (k int, v int);

-- k=1 is pushed to t1 and t2
explain select * from t1 left join t2 on t1.k = t2.k where t1.k = 1;
+---------------+------------------------------------------------------------+
| plan_type     | plan                                                       |
+---------------+------------------------------------------------------------+
| physical_plan | ┌───────────────────────────┐                              |
|               | │        HashJoinExec       │                              |
|               | │    --------------------   │                              |
|               | │      join_type: Left      ├──────────────┐               |
|               | │        on: (k = k)        │              │               |
|               | └─────────────┬─────────────┘              │               |
|               | ┌─────────────┴─────────────┐┌─────────────┴─────────────┐ |
|               | │      RepartitionExec      ││      RepartitionExec      │ |
|               | │    --------------------   ││    --------------------   │ |
|               | │ partition_count(in->out): ││ partition_count(in->out): │ |
|               | │          1 -> 12          ││          1 -> 12          │ |
|               | │                           ││                           │ |
|               | │    partitioning_scheme:   ││    partitioning_scheme:   │ |
|               | │      Hash([k@0], 12)      ││      Hash([k@0], 12)      │ |
|               | └─────────────┬─────────────┘└─────────────┬─────────────┘ |
|               | ┌─────────────┴─────────────┐┌─────────────┴─────────────┐ |
|               | │         FilterExec        ││         FilterExec        │ |
|               | │    --------------------   ││    --------------------   │ |
|               | │      predicate: k = 1     ││      predicate: k = 1     │ |
|               | └─────────────┬─────────────┘└─────────────┬─────────────┘ |
|               | ┌─────────────┴─────────────┐┌─────────────┴─────────────┐ |
|               | │       DataSourceExec      ││       DataSourceExec      │ |
|               | │    --------------------   ││    --------------------   │ |
|               | │          bytes: 0         ││          bytes: 0         │ |
|               | │       format: memory      ││       format: memory      │ |
|               | │          rows: 0          ││          rows: 0          │ |
|               | └───────────────────────────┘└───────────────────────────┘ |
|               |                                                            |
+---------------+------------------------------------------------------------+
```

- Changed `push_down_all_join` to push down inferred predicates
independently of `left_preserved`/`right_preserved` semantics.
- Added unit tests.

Yes.

No.

---------

Co-authored-by: xudong.w <wxd963996380@gmail.com>
@xudong963
Copy link
Collaborator

Upstream removed the CoalesceBatchesExec node in DF52 apache#19622; we haven't upgraded to it.

I took a look at the broken tests, looks like all of them add some CoalesceBatchesExec nodes. (@trevor-wilson-polygonio You need to double-check that).

If so, I think it's okay for us to update these tests. After upgrading to DF52, these tests will align with upstream.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants